GerNED: A German Corpus for Named Entity Disambiguation
نویسندگان
چکیده
Determining the real-world referents for name mentions of persons, organizations and other named entities in texts has become an important task in many information retrieval scenarios and is referred to as Named Entity Disambiguation (NED). While comprehensive datasets support the development and evaluation of NED approaches for English, there are no public datasets to assess NED systems for other languages, such as German. This paper describes the construction of an NED dataset based on a large corpus of German news articles. The dataset is closely modeled on the datasets used for the Knowledge Base Population tasks of the Text Analysis Conference, and contains gold standard annotations for the NED tasks of Entity Linking, NIL Detection and NIL Clustering. We also present first experimental results on the new dataset for each of these tasks in order to establish a baseline for future research efforts.
منابع مشابه
Named Entity Disambiguation for German News Articles
Named entity disambiguation has become an important research area providing the basis for improving search engine precision and for enabling semantic search. Current approaches for the named entity disambiguation are usually based on exploiting structured semantic and lingual resources (e.g. WordNet, DBpedia). Unfortunately, each of these resources cover independently from each other insufficie...
متن کاملIsaac Bloomberg Meets Michael Bloomberg: Better EntityDisambiguation for the News
This paper shows the implementation and evaluation of the Entity Linking or Named Entity Disambiguation system used and developed at Bloomberg. In particular, we present and evaluate a methodology and a system that do not require the use of Wikipedia as a knowledge base or training corpus. We present how we built features for disambiguation algorithms from the Bloomberg News corpus, and how we ...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملMutual Disambiguation for Entity Linking
The disambiguation algorithm presented in this paper is implemented in SemLinker, an entity linking system. First, named entities are linked to candidate Wikipedia pages by a generic annotation engine. Then, the algorithm re-ranks candidate links according to mutual relations between all the named entities found in the document. The evaluation is based on experiments conducted on the test corpu...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کامل